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Abstract:  Deployed  radar  and  signal  intelligence  (SIGINT)  systems  require  enormous  amounts 
of  real-time  computational  capability  that  must  adhere  to  very  confined  power,  weight  and 
volume  budgets.  Computational  requirements  have  been  increasing  as  advanced  adaptive  signal 
processing  techniques  make  their  way  from  laboratories  to  deployed  platforms.  In  many  cases 
the  computational  requirements  are  increasing  while  power,  weight  and  volume  budgets  are 
increasing  only  marginally  if  at  all.  These  conflicting  trends  show  an  increase  in  processing 
requirements  within  existing  platforms  are  driving  physical  and  environmental  budgets  to  ever 
higher  levels  of  efficiency  from  commercial  off-the-shelf  (COTS)  processing  systems.  This 
paper  will  address  current  trends  in  COTS  processor  designs  and  explore  models  that  hopefully 
will  predict  how  well  these  processors  should  perform  per  watt/kg/m^3.  The  models  will  focus 
on  space-time  adaptive  processing  (STAP)  and  SIGINT  processing  requirements  and  how  they 
map  to  very  large-scale  arrays  of  general-purpose  programmable  processors. 

System  designers  are  currently  examining  various  technology  options  for  increasing  the  levels  of 
sustainable  performance  per  watt/kg/m^3  for  their  applications.  These  options  include  field- 
programmable  gate  arrays  (EPGAs),  alternative  RISC  processor  architectures,  and  even  a 
possible  return  to  digital  signal  processor  (DSP)  devices.  Each  of  these  device  classes  has  an 
associated  cost  of  programmability,  flexibility,  upgradability,  and  interoperability  with  other 
devices.  The  question  of  efficiency  of  the  device  is  not  so  easily  stated,  though,  and  is  a  function 
of  the  processing  algorithms  that  must  be  performed  as  well  as  how  well  the  devices  can  be 
interconnected  in  a  large  parallel  processing  system.  It  does  not  appear  to  be  a  question  any  more 
of  “will”  a  particular  device  be  applicable  in  high-end  deployed  systems,  but  “where”  is  it  of  the 
highest  value  in  a  processing  chain.  Heterogeneous  systems  are  a  certainty  in  future  system 
designs.  Part  of  these  heterogeneous  systems  will  continue  to  be  large-scale  processing  arrays  of 
advanced  RISC  processors  that  still  hold  much  value  for  large  amounts  of  the  emerging 
application  requirements.  This  presentation  will  examine  trends  in  extracting  more  performance 
out  of  advanced  RISC  processors  in  order  to  meet  stringent  platform  environmental  and  power 
budgets. 
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Although  the  most  flexible  and  easiest  to  program,  extracting  performance  out  of  a  RISC-type 
processor  is  a  challenging  endeavor.  Today’s  processors  are  highly  complex  and  sophisticated 
architectures  that  include  RISC  cores,  vector  processing  units,  multi-stage  memory  hierarchies  as 
well  as  high  levels  of  integration  of  I/O  interfaces  and  advanced  data  transport  facilities. 
Furthermore,  to  use  these  devices  in  practical  applications  continues  to  require  building  large 
arrays  of  these  devices  in  complex  highly  interconnected  configurations.  A  system  wide 
understanding  of  these  complex  systems  is  required  in  order  to  model  their  performance  in  radar 
and  SIGINT  applications.  The  models  will  include  the  effects  of  concurrent  accesses  of  local 
memory  systems  by  both  the  processor  and  network.  System-level  relative  performance  levels  of 
existing  and  future  RISC  processors  by  various  vendors  will  be  examined,  as  will  the  effects  of 
interconnect  technologies  and  the  network  interface  devices. 
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•  Provide  an  evaluation  of  the  achieved  performance  levels  of 
RISC  processing  nodes  in  radar  applications 

♦  RT_STAP  Benchmark  is  used  as  the  representative  benchmark 

♦  Evaluation  over  a  4-5  year  span  of  node  technologies 

•  Current  node  performance  (2001-2003)  against  previous  node 
performance  (1998-2000) 

•  We’ll  then  make  projections  of  performance  per  watt  for 
emerging  PPC/AltiVec-based  node  architectures 

♦  Evaluation  for  the  generation  of  technology  that  will  be  emerging 
within  the  next  1-2  years  (2003-2005) 

♦  Examine  where  the  major  bottlenecks  are  and  how  much  additional 
delivered  performance  we  may  obtain  as  we  address  these 
bottlenecks  in  a  hypothetical  node  design  (2005+) 
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•  Throughput  has  increased  by  a  factor  of  5.6-6.4X 

♦  Power  per  node  (PPC  +  Network  Interface)  has  increased  by  -10%  over  these 
generations 

♦  Throughput/watt  has  increased  by  factor  of  5-5.7X 

♦  Overlapping  communications  and  processing  would  get  closer  to  the  upper 
end  of  these  performance  ranges 


PPC  with  AltiVec  circa 
2002-2003.  "AltiVec 
Performance"  RT  STAP 
code  base. 


PPC  with  AltiVec  circa 
2002-2003.  "AltiVec  Port" 
RT  STAR  code  base. 


Reference  Data:  PPC 
processor  circa  1998- 
1999 
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Throughput  per  node  has  made  substantial  improvements  at  the 
application  level  in  the  past  few  generations  due  to  major 
architectural  improvements:  AltiVec 

■  >  5-6.5X  throughput  improvement  with  power  per  node  hoiding  constant 

■  Based  upon  this  one  application  which  maps  fairly  well  to  the  AltiVec  processor 


In  order  to  get  these  benefits,  heavy  investments  in  IP  (Intellectual 
Property)  must  be  made 

■  Must  investigate  bottlenecks  in  the  application  and  develop  new  routines  that 
break  these  bottlenecks  and  exploit  a  new  chip’s  architecture  (such  as  AltiVec) 

■  In  this  case  the  vendor  has  made  those  investments  in  their  middleware 

■  By  and  large  the  application  code  base  required  trivial  changes  to  use  these  new 
routines. ..no  major  structural  changes  in  the  code,  ie.,  most  of  the  burden  is  on 
the  vendor’s  middleware 


*  Although  good,  these  results  indicate  that  we’re  not  tracking 
Moore’s  Law 

■  In  4-5  years  only  seeing  a  5-6.5X  improvement  not  the  doubling  every  1.5 
years  as  Moore  predicted 

■  This  required  a  major  architectural  improvement:  AltiVec 

■  Moore  doesn’t  predict  performance  per  watt 
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Performance  to  date  with  AltiVec-type  RISC  +  SIMD 
processors  have  measured  up  to  the  performance  expectations 
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♦  But,  as  expected,  this  has  involved  a  significant  IP  investment  in  software 

♦  This  investment  has  been  paid  for  by  the  vendor  and  the  application  has  been  largely 
insulated 


♦  RT_STAP  benchmark  has  shown  a  5-6.5X  improvement  in  delivered  performance  per 
watt  using  currently  available  technology 


Throughput/node  is  increasing  with  every  generation 

♦  To  date  AltiVec  has  yielded  more  than  a  quadrupling  at  the  application  level  for  a 
given  power  rating 

♦  Next-generation  nodes  should  track  clock  increases  assuming  I/O  rates  increase  at 
same  rate 

♦  Generation  after  that  could  provide  big  step  improvement  due  to  architectural 
improvements  in  sustained  10  per  node 


Power  per  node  is  increasing 

♦  Throughput  is  absolutely  increasing  but  so  is  the  power 

♦  An  improvement  in  the  sustained  10  per  node  should  improve  the  delivered 
performance 

♦  Challenge  for  computer  system  designers  is  packaging  these  nodes  in  dense 
systems 

♦  Challenge  for  Radar  system  designers  is  cooling  these  systems  on  their  platforms 
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